The N-Grams Based Text Similarity Detection Approach Using Self-Organizing Maps and Similarity Measures

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Similarity Using Google Tri-grams

The purpose of this paper is to propose an unsupervised approach for measuring the similarity of texts that can compete with supervised approaches. Finding the inherent properties of similarity between texts using a corpus in the form of a word n-gram data set is competitive with other text similarity techniques in terms of performance and practicality. Experimental results on a standard data s...

متن کامل

Dependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition

Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this prob­ lem, several lexical, syntactic and semantic based tech­ niques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntac­ tic dependency and...

متن کامل

Text Reuse Detection using a Composition of Text Similarity Measures

Detecting text reuse is a fundamental requirement for a variety of tasks and applications, ranging from journalistic text reuse to plagiarism detection. Text reuse is traditionally detected by computing similarity between a source text and a possibly reused text. However, existing text similarity measures exhibit a major limitation: They compute similarity only on features which can be derived ...

متن کامل

Binary-based similarity measures for categorical data and their application in Self- Organizing Maps

In exploratory data analysis of high dimensional data one Eof the main tasks is the formation of a simplified overview of data sets. Clustering and projection are among the examples of useful methods to achieve this task. However there are several types of data where the use of this measure is not adequate, such as the categorical data. In this paper we will review some of the most common binar...

متن کامل

Gauging Similarity with n-Grams: Language-Independent Categorization of Text.

A language-independent means of gauging topical similarity in unrestricted text is described. The method combines information derived from n-grams (consecutive sequences of n characters) with a simple vector-space technique that makes sorting, categorization, and retrieval feasible in a large multilingual collection of documents. No prior information about document content or language is requir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Sciences

سال: 2019

ISSN: 2076-3417

DOI: 10.3390/app9091870